The search functionality is under construction.

Keyword Search Result

[Keyword] interconnection network(96hit)

21-40hit(96hit)

  • Cyclic Vertex Connectivity of Trivalent Cayley Graphs

    Jenn-Yang KE  

     
    PAPER-Fundamentals of Information Systems

      Pubricized:
    2018/03/30
      Vol:
    E101-D No:7
      Page(s):
    1828-1834

    A vertex subset F ⊆ V(G) is called a cyclic vertex-cut set of a connected graph G if G-F is disconnected such that at least two components in G-F contain cycles. The cyclic vertex connectivity is the cardinality of a minimum cyclic vertex-cut set. In this paper, we show that the cyclic vertex connectivity of the trivalent Cayley graphs TGn is equal to eight for n ≥ 4.

  • A Static Packet Scheduling Approach for Fast Collective Communication by Using PSO

    Takashi YOKOTA  Kanemitsu OOTSU  Takeshi OHKAWA  

     
    PAPER-Interconnection networks

      Pubricized:
    2017/07/14
      Vol:
    E100-D No:12
      Page(s):
    2781-2795

    Interconnection network is one of the inevitable components in parallel computers, since it is responsible to communication capabilities of the systems. It affects the system-level performance as well as the physical and logical structure of the systems. Although many studies are reported to enhance the interconnection network technology, we have to discuss many issues remaining. One of the most important issues is congestion management. In an interconnection network, many packets are transferred simultaneously and the packets interfere to each other in the network. Congestion arises as a result of the interferences. Its fast spreading speed seriously degrades communication performance and it continues for long time. Thus, we should appropriately control the network to suppress the congested situation for maintaining the maximum performance. Many studies address the problem and present effective methods, however, the maximal performance in an ideal situation is not sufficiently clarified. Solving the ideal performance is, in general, an NP-hard problem. This paper introduces particle swarm optimization (PSO) methodology to overcome the problem. In this paper, we first formalize the optimization problem suitable for the PSO method and present a simple PSO application as naive models. Then, we discuss reduction of the size of search space and introduce three practical variations of the PSO computation models as repetitive model, expansion model, and coding model. We furthermore introduce some non-PSO methods for comparison. Our evaluation results reveal high potentials of the PSO method. The repetitive and expansion models achieve significant acceleration of collective communication performance at most 1.72 times faster than that in the bursty communication condition.

  • Implementing Exchanged Hypercube Communication Patterns on Ring-Connected WDM Optical Networks

    Yu-Liang LIU  Ruey-Chyi WU  

     
    PAPER-Interconnection networks

      Pubricized:
    2017/08/04
      Vol:
    E100-D No:12
      Page(s):
    2771-2780

    The exchanged hypercube, denoted by EH(s,t), is a graph obtained by systematically removing edges from the corresponding hypercube, while preserving many of the hypercube's attractive properties. Moreover, ring-connected topology is one of the most promising topologies in Wavelength Division Multiplexing (WDM) optical networks. Let Rn denote a ring-connected topology. In this paper, we address the routing and wavelength assignment problem for implementing the EH(s,t) communication pattern on Rn, where n=s+t+1. We design an embedding scheme. Based on the embedding scheme, a near-optimal wavelength assignment algorithm using 2s+t-2+⌊2t/3⌋ wavelengths is proposed. We also show that the wavelength assignment algorithm uses no more than an additional 25 percent of (or ⌊2t-1/3⌋) wavelengths, compared to the optimal wavelength assignment algorithm.

  • A Layout-Oriented Routing Method for Low-Latency HPC Networks

    Ryuta KAWANO  Hiroshi NAKAHARA  Ikki FUJIWARA  Hiroki MATSUTANI  Michihiro KOIBUCHI  Hideharu AMANO  

     
    PAPER-Interconnection networks

      Pubricized:
    2017/07/14
      Vol:
    E100-D No:12
      Page(s):
    2796-2807

    End-to-end network latency has become an important issue for parallel application on large-scale high performance computing (HPC) systems. It has been reported that randomly-connected inter-switch networks can lower the end-to-end network latency. This latency reduction is established in exchange for a large amount of routing information. That is, minimal routing on irregular networks is achieved by using routing tables for all destinations in the networks. In this work, a novel distributed routing method called LOREN (Layout-Oriented Routing with Entries for Neighbors) to achieve low-latency with a small routing table is proposed for irregular networks whose link length is limited. The routing tables contain both physically and topologically nearby neighbor nodes to ensure livelock-freedom and a small number of hops between nodes. Experimental results show that LOREN reduces the average latencies by 5.8% and improves the network throughput by up to 62% compared with a conventional compact routing method. Moreover, the number of required routing table entries is reduced by up to 91%, which improves scalability and flexibility for implementation.

  • The Performance Evaluation of a 3D Torus Network Using Partial Link-Sharing Method in NoC Router Buffer

    Naohisa FUKASE  Yasuyuki MIURA  Shigeyoshi WATANABE  M.M. HAFIZUR RAHMAN  

     
    PAPER-Computer System

      Pubricized:
    2017/06/30
      Vol:
    E100-D No:10
      Page(s):
    2478-2492

    The high performance network-on-chip (NoC) router using minimal hardware resources to minimize the layout area is very essential for NoC design. In this paper, we have proposed a memory sharing method of a wormhole routed NoC architecture to alleviate the area overhead of a NoC router. In the proposed method, a memory is shared by multiple physical links by using a multi-port memory. In this paper, we have proposed a partial link-sharing method and evaluated the communication performance using the proposed method. It is revealed that the resulted communication performance by the proposed methods is higher than that of the conventional method, and the progress ratio of the 3D-torus network is higher than that of 2D-torus network. It is shown that the improvement of communication performance using partial link sharing method is achieved with slightly increase of hardware cost.

  • Stochastic Fault-Tolerant Routing in Dual-Cubes

    Junsuk PARK  Nobuhiro SEKI  Keiichi KANEKO  

     
    LETTER-Dependable Computing

      Pubricized:
    2017/05/10
      Vol:
    E100-D No:8
      Page(s):
    1920-1921

    In the topologies for interconnected nodes, it is desirable to have a low degree and a small diameter. For the same number of nodes, a dual-cube topology has almost half the degree compared to a hypercube while increasing the diameter by just one. Hence, it is a promising topology for interconnection networks of massively parallel systems. We propose here a stochastic fault-tolerant routing algorithm to find a non-faulty path from a source node to a destination node in a dual-cube.

  • Node-to-Node Disjoint Paths Problem in Möbius Cubes

    David KOCIK  Keiichi KANEKO  

     
    PAPER-Dependable Computing

      Pubricized:
    2017/04/25
      Vol:
    E100-D No:8
      Page(s):
    1837-1843

    The Möbius cube is a variant of the hypercube. Its advantage is that it can connect the same number of nodes as a hypercube but with almost half the diameter of the hypercube. We propose an algorithm to solve the node-to-node disjoint paths problem in n-Möbius cubes in polynomial-order time of n. We provide a proof of correctness of the algorithm and estimate that the time complexity is O(n2) and the maximum path length is 3n-5.

  • Novel Chip Stacking Methods to Extend Both Horizontally and Vertically for Many-Core Architectures with ThrouChip Interface

    Hiroshi NAKAHARA  Tomoya OZAKI  Hiroki MATSUTANI  Michihiro KOIBUCHI  Hideharu AMANO  

     
    PAPER-Architecture

      Pubricized:
    2016/08/24
      Vol:
    E99-D No:12
      Page(s):
    2871-2880

    The increase of recent non-recurrent engineering cost (design, mask and test cost) have made large System-on-Chip (SoC) difficult to develop especially with advanced technology. We radically explore an approach for cheap and flexible chip stacking by using Inductive coupling ThruChip Interface (TCI). In order to connect a large number of small chips for building a large scale system, novel chip stacking methods called the linear stacking and staggered stacking are proposed. They enable the system to be extended to x or/and y dimensions, not only to z dimension. Here, a novel chip staking layout, and its deadlock-free routing design for the case using single-core chips and multi-core chips are shown. The network with 256 nodes formed by the proposed stacking improves the latency of 2D mesh by 13.8% and the performance of NAS Parallel Benchmarks by 5.4% on average compared to that of 2D mesh.

  • Enhancing Entropy Throttling: New Classes of Injection Control in Interconnection Networks

    Takashi YOKOTA  Kanemitsu OOTSU  Takeshi OHKAWA  

     
    PAPER-Interconnection network

      Pubricized:
    2016/08/25
      Vol:
    E99-D No:12
      Page(s):
    2911-2922

    State-of-the-art parallel computers, which are growing in parallelism, require a lot of things in their interconnection networks. Although wide spectrum of efforts in research and development for effective and practical interconnection networks are reported, the problem is still open. One of the largest issues is congestion control that intends to maximize the network performance in terms of throughput and latency. Throttling, or injection limitation, is one of the center ideas of congestion control. We have proposed a new class of throttling method, Entropy Throttling, whose foundation is entropy concept of packets. The throttling method is successful in part, however, its potentials are not sufficiently discussed. This paper aims at exploiting capabilities of the Entropy Throttling method via comprehensive evaluation. Major contributions of this paper are to introduce two ideas of hysteresis function and guard time and also to clarify wide performance characteristics in steady and unsteady communication situations. By introducing the new ideas, we extend the Entropy throttling method. The extended methods improve communication performance at most 3.17 times in the best case and 1.47 times in average compared with non-throttling cases in collective communication, while the method can sustain steady communication performance.

  • Job Mapping and Scheduling on Free-Space Optical Networks

    Yao HU  Ikki FUJIWARA  Michihiro KOIBUCHI  

     
    PAPER-Computer System

      Pubricized:
    2016/08/16
      Vol:
    E99-D No:11
      Page(s):
    2694-2704

    A number of parallel applications run on a high-performance computing (HPC) system simultaneously. Job mapping and scheduling become crucial to improve system utilization, because fragmentation prevents an incoming job from being assigned even if there are enough compute nodes unused. Wireless supercomputers and datacenters with free-space optical (FSO) terminals have been proposed to replace the conventional wired interconnection so that a diverse application workload can be better supported by changing their network topologies. In this study we firstly present an efficient job mapping by swapping the endpoints of FSO links in a wireless HPC system. Our evaluation shows that an FSO-equipped wireless HPC system can achieve shorter average queuing length and queuing time for all the dispatched user jobs. Secondly, we consider the use of a more complicated and enhanced scheduling algorithm, which can further improve the system utilization over different host networks, as well as the average response time for all the dispatched user jobs. Finally, we present the performance advantages of the proposed wireless HPC system under more practical assumptions such as different cabinet capacities and diverse subtopology packings.

  • Layout-Conscious Expandable Topology for Low-Degree Interconnection Networks

    Thao-Nguyen TRUONG  Khanh-Van NGUYEN  Ikki FUJIWARA  Michihiro KOIBUCHI  

     
    PAPER-Computer System

      Pubricized:
    2016/02/02
      Vol:
    E99-D No:5
      Page(s):
    1275-1284

    System expandability becomes a major concern for highly parallel computers and data centers, because their number of nodes gradually increases year by year. In this context we propose a low-degree topology and its floor layout in which a cabinet or node set can be newly inserted by connecting short cables to a single existing cabinet. Our graph analysis shows that the proposed topology has low diameter, low average shortest path length and short average cable length comparable to existing topologies with the same degree. When incrementally adding nodes and cabinets to the proposed topology, its diameter and average shortest path length increase modestly. Our discrete-event simulation results show that the proposed topology provides a comparable performance to 2-D Torus for some parallel applications. The network cost and power consumption of DSN-F modestly increase when compared to the counterpart non-random topologies.

  • Node-to-Set Disjoint Paths Problem in a Möbius Cube

    David KOCIK  Yuki HIRAI  Keiichi KANEKO  

     
    PAPER-Dependable Computing

      Pubricized:
    2015/12/14
      Vol:
    E99-D No:3
      Page(s):
    708-713

    This paper proposes an algorithm that solves the node-to-set disjoint paths problem in an n-Möbius cube in polynomial-order time of n. It also gives a proof of correctness of the algorithm as well as estimating the time complexity, O(n4), and the maximum path length, 2n-1. A computer experiment is conducted for n=1,2,...,31 to measure the average performance of the algorithm. The results show that the average time complexity is gradually approaching to O(n3) and that the maximum path lengths cannot be attained easily over the range of n in the experiment.

  • The Fault-Tolerant Hamiltonian Problems of Crossed Cubes with Path Faults

    Hon-Chan CHEN  Tzu-Liang KUNG  Yun-Hao ZOU  Hsin-Wei MAO  

     
    PAPER-Switching System

      Pubricized:
    2015/09/15
      Vol:
    E98-D No:12
      Page(s):
    2116-2122

    In this paper, we investigate the fault-tolerant Hamiltonian problems of crossed cubes with a faulty path. More precisely, let P denote any path in an n-dimensional crossed cube CQn for n ≥ 5, and let V(P) be the vertex set of P. We show that CQn-V(P) is Hamiltonian if |V(P)|≤n and is Hamiltonian connected if |V(P)| ≤ n-1. Compared with the previous results showing that the crossed cube is (n-2)-fault-tolerant Hamiltonian and (n-3)-fault-tolerant Hamiltonian connected for arbitrary faults, the contribution of this paper indicates that the crossed cube can tolerate more faulty vertices if these vertices happen to form some specific types of structures.

  • The Case for Network Coding for Collective Communication on HPC Interconnection Networks Open Access

    Ahmed SHALABY  Ikki FUJIWARA  Michihiro KOIBUCHI  

     
    PAPER-Information Network

      Pubricized:
    2014/12/11
      Vol:
    E98-D No:3
      Page(s):
    661-670

    Recently network bandwidth becomes a performance concern particularly for collective communication since bisection bandwidths of supercomputers become far less than their full bisection bandwidths. In this context we propose the use of a network coding technique to reduce the number of unicasts and the size of data transferred in latency-sensitive collective communications in supercomputers. Our proposed network coding scheme has a hierarchical multicasting structure with intra-group and inter-group unicasts. Quantitative analysis show that the aggregate path hop counts by our hierarchical network coding decrease as much as 94% when compared to conventional unicast-based multicasts. We validate these results by cycle-accurate network simulations. In 1,024-switch networks, the network reduces the execution time of collective communications as much as 70%. We also show that our hierarchical network coding is beneficial for any packet size.

  • Completely Independent Spanning Trees on Some Interconnection Networks

    Kung-Jui PAI  Jinn-Shyong YANG  Sing-Chen YAO  Shyue-Ming TANG  Jou-Ming CHANG  

     
    LETTER-Information Network

      Vol:
    E97-D No:9
      Page(s):
    2514-2517

    Let T1,T2,...,Tk be spanning trees in a graph G. If, for any two vertices u,v of G, the paths joining u and v on the k trees are mutually vertex-disjoint, then T1,T2,...,Tk are called completely independent spanning trees (CISTs for short) of G. The construction of CISTs can be applied in fault-tolerant broadcasting and secure message distribution on interconnection networks. Hasunuma (2001) first introduced the concept of CISTs and conjectured that there are k CISTs in any 2k-connected graph. Unfortunately, this conjecture was disproved by Péterfalvi recently. In this note, we give a necessary condition for k-connected k-regular graphs with ⌊k/2⌋ CISTs. Based on this condition, we provide more counterexamples for Hasunuma's conjecture. By contrast, we show that there are two CISTs in 4-regular chordal rings CR(N,d) with N=k(d-1)+j under the condition that k ≥ 4 is even and 0 ≤ j ≤ 4. In particular, the diameter of each constructed CIST is derived.

  • Longest Fault-Free Cycles in Folded Hypercubes with Conditional Faulty Elements

    Wen-Yin HUANG  Jia-Jie LIU  Jou-Ming CHANG  Ro-Yu WU  

     
    PAPER

      Vol:
    E97-A No:6
      Page(s):
    1187-1191

    An n-dimensional folded hypercube, denoted by FQn, is an enhanced n-dimensional hypercube with one extra link between nodes that have the furthest Hamming distance. Let FFv (respectively, FFe) denote the set of faulty nodes (respectively, faulty links) in FQn. Under the assumption that every fault-free node in FQn is incident to at least two fault-free links, Hsieh et al. (Inform. Process. Lett. 110 (2009) pp.41-53) showed that if |FFv|+|FFe| ≤ 2n-4 for n ≥ 3, then FQn-FFv-FFe contains a fault-free cycle of length at least 2n-2|FFv|. In this paper, we show that, under the same conditional fault model, FQn with n ≥ 5 can tolerate more faulty elements and provides the same lower bound of the length of a longest fault-free cycle, i.e., FQn-FFv-FFe contains a fault-free cycle of length at least 2n-2|FFv| if |FFv|+|FFe| ≤ 2n-3 for n ≥ 5.

  • A Tree-Structured Deterministic Small-World Network

    Shi-Ze GUO  Zhe-Ming LU  Guang-Yu KANG  Zhe CHEN  Hao LUO  

     
    LETTER-Artificial Intelligence, Data Mining

      Vol:
    E95-D No:5
      Page(s):
    1536-1538

    Small-world is a common property existing in many real-life social, technological and biological networks. Small-world networks distinguish themselves from others by their high clustering coefficient and short average path length. In the past dozen years, many probabilistic small-world networks and some deterministic small-world networks have been proposed utilizing various mechanisms. In this Letter, we propose a new deterministic small-world network model by first constructing a binary-tree structure and then adding links between each pair of brother nodes and links between each grandfather node and its four grandson nodes. Furthermore, we give the analytic solutions to several topological characteristics, which shows that the proposed model is a small-world network.

  • Hybrid Wired/Wireless On-Chip Network Design for Application-Specific SoC

    Shouyi YIN  Yang HU  Zhen ZHANG  Leibo LIU  Shaojun WEI  

     
    PAPER

      Vol:
    E95-C No:4
      Page(s):
    495-505

    Hybrid wired/wireless on-chip network is a promising communication architecture for multi-/many-core SoC. For application-specific SoC design, it is important to design a dedicated on-chip network architecture according to the application-specific nature. In this paper, we propose a heuristic wireless link allocation algorithm for creating hybrid on-chip network architecture. The algorithm can eliminate the performance bottleneck by replacing multi-hop wired paths by high-bandwidth single-hop long-range wireless links. The simulation results show that the hybrid on-chip network designed by our algorithm improves the performance in terms of both communication delay and energy consumption significantly.

  • Two-Level FIFO Buffer Design for Routers in On-Chip Interconnection Networks

    Po-Tsang HUANG  Wei HWANG  

     
    PAPER-VLSI Design Technology and CAD

      Vol:
    E94-A No:11
      Page(s):
    2412-2424

    The on-chip interconnection network (OCIN) is an integrated solution for system-on-chip (SoC) designs. The buffer architecture and size, however, dominate the performance of OCINs and affect the design of routers. This work analyzes different buffer architectures and uses a data-link two-level FIFO (first-in first-out) buffer architecture to implement high-performance routers. The concepts of shared buffers and multiple accesses for buffers are developed using the two-level FIFO buffer architecture. The proposed two-level FIFO buffer architecture increases the utilities of the storage elements via the centralized buffer organization and reduces the area and power consumption of routers to achieve the same performance achieved by other buffer architectures. Depending on a cycle-accurate simulator, the proposed data-link two-level FIFO buffer can realize performance similar to that of the conventional virtual channels, while using 25% of the buffers. Consequently, the two-level FIFO buffer can achieve about 22% power reduction compared with the similar performance of the conventional virtual channels using UMC 65 nm CMOS technology.

  • TTN: A High Performance Hierarchical Interconnection Network for Massively Parallel Computers

    M.M. Hafizur RAHMAN  Yasushi INOGUCHI  Yukinori SATO  Susumu HORIGUCHI  

     
    PAPER-Computer Systems

      Vol:
    E92-D No:5
      Page(s):
    1062-1078

    Interconnection networks play a crucial role in the performance of massively parallel computers. Hierarchical interconnection networks provide high performance at low cost by exploring the locality that exists in the communication patterns of massively parallel computers. A Tori connected Torus Network (TTN) is a 2D-torus network of multiple basic modules, in which the basic modules are 2D-torus networks that are hierarchically interconnected for higher-level networks. This paper addresses the architectural details of the TTN and explores aspects such as node degree, network diameter, cost, average distance, arc connectivity, bisection width, and wiring complexity. We also present a deadlock-free routing algorithm for the TTN using four virtual channels and evaluate the network's dynamic communication performance using the proposed routing algorithm under uniform and various non-uniform traffic patterns. We evaluate the dynamic communication performance of TTN, TESH, MH3DT, mesh, and torus networks by computer simulation. It is shown that the TTN possesses several attractive features, including constant node degree, small diameter, low cost, small average distance, moderate (neither too low, nor too high) bisection width, and high throughput and very low zero load latency, which provide better dynamic communication performance than that of other conventional and hierarchical networks.

21-40hit(96hit)